MetDecode: methylation-based deconvolution of cell-free DNA for noninvasive multi-cancer typing

Antoine Passemiers et al.

Presenter: Tony Liang


October 31, 2024

Background

Circulating cell free DNA in bloodstream1
  • Circulating free DNA (cfDNA) are DNA fragments released into bloodstream

  • Fraction of cfDNA could be released from cancer or tumor cells are circulating-tumor DNA (ctDNA)

  • Contains genetic and epigenetic changes, and could reveal the cells from which is originated

    • Identify different types of cancer

Detecting the origin of cfDNA

Current cfDNA screening test can detect presence of abnormal signals but cannot tell tumor’s origin or cancer type or tissue of origin (TOO)

  • Computation methods use epigenetic markers like methylation profiles to deduce origin of cfDNA fragments
    • “Deconvolute” plasma cfDNA composition
    • Varying approach, probabilistic, linear model, matrix factorization, etc.

Existing methods limiations

  • Cannot deconvolute multiple cancer tissues
  • Do not account for missing variables due to incompleteness of atlas
  • Do not allow full deconvolution of all cfDNA components and estimate cell proportion only

MetDecode

Authors1 came up with their reference-based deconvolution method…

In some sense, “combining” existing methodology like nonnegative least squares, matrix factorization etc.

  • Also built a newer reference atlas of tissue-specific methylation markers for 4 different cancer tissues
    • Breast, ovarian, cervical and colorectal
  • With option to extend reference atlas with unknown methylation patterns on-the-fly

The main deconvolution algorithm

\[ f(A) \quad = \quad \sum\limits_{i=1}^n \sum\limits_{k=1}^p \quad W_{ik} \quad \Big| \underbrace{R_{ik}^{\text{(cfdna)}}}_{(1)} - \underbrace{\sum\limits_{j=1}^m A_{ij} B_{jk}}_{(2)}\Big| \]

  1. Methylation ratios \(R^{\text{cfdna}}\)
  2. Reconstructed matrix, which approximates \((1)\)

Some math behind how MetDecode address unknown cell type contributor

To account for \(h\) unknown contributors in cfDNA mixture by adding \(h\) extra rows to \(R^{\text{(atlas)}}\)

\[ R_{hk}^{\text{(atlas)}} = \begin{cases} R_k^{lb}, \quad e_k > 0 \\ R_k^{ub}, \quad otherwise \end{cases} \quad \text{where} \quad e_k = \text{median}_i \quad \Big( -R_{ik}^{(cfdna)} + \sum\limits_{j} \alpha_{ij} R_{jk}^{(\text{atlas})} \Big) \]

Evaluation metrics

  • Pearson Correlation Coefficient \(\rho\) and Mean Squared Error (MSE) to evaluate MetDecode estimations

  • Accuracy to evaluate multiclass cancer TOO prediction, and Cohen’s kappa to adjust for multiclass nature of the problem

Some notations

\[ MSE = \quad \frac{1}{n} \sum\limits_{i=1}^{n} \quad (Y_i - \hat{Y_i})^2 \]

\[ \begin{align*} \kappa &= \frac{(p_o - p_e)}{(1 - p_e)}, \quad p_e = \frac{1}{N^2} \sum\limits_{k=1}^K n_{k1} n_{k2} \end{align*} \]

where \(n_{k1}\) is number of times label \(k\) appears in predictions, and \(n_{k2}\) is number of times label \(k\) is a true label1

Creation of reference atlas and marker selection

Heatmap displaying the methylation ratios of all the selected marker regions



  • READ had greatest number of siginifcant DMRs
    • In contrast OV and CESC had lowest, due to lower coverage of these samples
  • Three variants of deconvolution setting came up to address above:
    1. Using all marker regions
    2. Only signifcant marker regions
    3. 23 most discriminative marker regions for each cell type, whereas 23 is roughly minimal number of DMR found for a cell type

Coverage-based weighthing in MetDecode

Evaluation of the coverage-based weighting used in MetDecode


  • Ran on 50 simulation runs, each containing \(5000\) simulated cfDNA samples.

  • Then computed Pearson Correlation Coefficient of different deconvolution algorithms

  • Upon averaging all correlation coefficients, MetDecode was significantly higher than all other approaches

    • BUT not for looking at blood cell types only

Accuracy of complex mixture deconvolution

Deconvolution of genomic DNA methylation profiles

  • High correlation when comparing complete blood counting and MetDecode deconvolution estimates

  • MetDecode without unknown contributor outperformed NNLS in terms of average pearson correlation and MSE

    • Adding unknown contributor should’ve increased performance though, but did not

Identify TOO in cfDNA from cancer patients

Cancer type prediction comparisons based on highest cancer contributors
  • MetDecode with 1 unknown contributor performs best based on Cohen’kappa

  • All methods perform equally poor for \(< 50\%\) accuracy when predicting all samples

  • Closer performance when looking at those \(19\) samples with tumor fraction \(> 3\%\)1

    • This is its \(84.2\%\) accuracy of correct TOO in \(16/19\) cancer cases

Conclusion

How could one utilize cfDNA?

cfDNA epigenetic signatures can be used to deduce TOO or cancer type

MetDecode is an algorithm that estimates contributions and type of cancer in cfDNA sample

  • It models unknown contributors not present in the reference atlas

  • And accounts for coverage of each marker region to alleviate potential sources of noise

Limitations and Future Direction

  • Limited size of cfDNA samples for different cancer types
    • Total 93 samples, 4 being Cervical, 13 being Ovarian, rest are breast and colorectal
  • Deconvoluting and defining the TOO will aid the oncologists in identifying the tumor and direct treatment
    • Specially when invasive examinations and radiological investigation are not ideal

Some comments

  • Why weighting approach only improves deconvolution accuracy on cancer components only and not in blood cell types?

  • Why sometimes adding extra unknown contributor yields better result and sometimes not?

  • Cell type deconvolution still seems hard (low accuracy in terms of predicting cancer type), what is the next step?

  • Aside, can you always just combined existing approach to get a “new” method out?

Thanks!

Reference

Adalsteinsson, Viktor A, Gavin Ha, Samuel S Freeman, Atish D Choudhury, Daniel G Stover, Heather A Parsons, Gregory Gydush, et al. 2017. “Scalable Whole-Exome Sequencing of Cell-Free DNA Reveals High Concordance with Metastatic Tumors.” Nature Communications 8 (1): 1324.
Artstein, Ron, and Massimo Poesio. 2008. “Inter-Coder Agreement for Computational Linguistics.” Computational Linguistics 34 (4): 555–96.
Passemiers, Antoine, Stefania Tuveri, Dhanya Sudhakaran, Tatjana Jatsenko, Tina Laga, Kevin Punie, Sigrid Hatse, et al. 2024. “MetDecode: Methylation-Based Deconvolution of Cell-Free DNA for Noninvasive Multi-Cancer Typing.” Bioinformatics 40 (9): btae522.